dcg_score (Discounted Cumulative Gain)#
dcg_score measures ranking quality: it rewards putting highly relevant items at the top, and it discounts lower ranks logarithmically.
If you care about “are the best things at the top?” (search, recommendations, multilabel ranking), DCG is a natural tool.
Learning goals#
understand the DCG@k definition and the log discount
build intuition with plots (per-rank contributions + cumulative curves)
implement
dcg_scorein NumPy (including tie-averaging like scikit-learn)see how DCG/NDCG is used to tune a simple linear ranking model
know pros/cons and common pitfalls
Quick import#
from sklearn.metrics import dcg_score
Table of contents#
Definitions and notation
Intuition (plots)
NumPy implementation (+ sanity checks vs sklearn)
Using DCG/NDCG to optimize/tune a simple model
Pros, cons, pitfalls
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots
from sklearn.metrics import dcg_score as sk_dcg_score
from sklearn.metrics import ndcg_score as sk_ndcg_score
from sklearn.model_selection import train_test_split
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=4, suppress=True)
rng = np.random.default_rng(0)
1) Definitions and notation#
Setup (one query)#
For one query (or one sample), assume we have \(m\) items (documents, products, labels, …).
True gain (a.k.a. relevance): \(y_j\) for item \(j\)
(usually \(y_j \ge 0\), often an integer like 0/1/2/3)Predicted score: \(s_j\) for item \(j\)
Let \(\pi\) be the permutation that sorts scores descending:
Define a logarithmic discount at rank \(r\) (1-indexed):
\(b\) is the log base (scikit-learn uses
log_base=2by default)note \(d_1 = 1/\log_b(2) = 1\), so the top rank is not discounted
DCG@k#
Discounted Cumulative Gain at cutoff \(k\) is:
Interpretation: each item’s gain is counted, but late ranks matter less.
Important note: what is “gain”?#
In classical Information Retrieval, people often transform relevance levels with an exponential gain, e.g.
and compute DCG on \(g(y)\). scikit-learn’s dcg_score does not apply a gain transform: it assumes y_true already contains the gains you want.
Tie handling (scikit-learn default)#
If some predicted scores are tied, scikit-learn averages over all tie permutations. For a tied group \(T\) occupying ranks \(r..r+|T|-1\):
This equals the expected DCG if you break ties uniformly at random.
Set ignore_ties=True only if you are sure there are no ties.
Normalization (NDCG)#
Raw DCG depends on the scale of gains and the number of items. A common normalization is:
where \(\mathrm{IDCG@}k\) is the DCG@k of the ideal ranking (items sorted by decreasing \(y\)). NDCG is typically in \([0,1]\) when gains are non-negative.
2) Intuition (plots)#
We’ll use a toy query with 6 items.
We’ll compare two rankings:
a good model puts high-gain items near the top
a bad model pushes good items down
We’ll visualize:
per-rank gain, discount, and contribution
cumulative DCG curves
DCG@k as a function of the cutoff \(k\)
doc_ids = np.array([f"doc_{i}" for i in range(6)])
# Gains (true relevance). In sklearn, these are used directly (no gain transform).
y_true_1q = np.array([3, 2, 3, 0, 1, 2], dtype=float)
# Two different score vectors (both induce a ranking).
y_score_good = np.array([0.90, 0.70, 0.80, 0.05, 0.30, 0.60])
y_score_bad = np.array([0.10, 0.20, 0.30, 0.95, 0.80, 0.70])
Y_true = y_true_1q[None, :]
print("sklearn DCG (good):", sk_dcg_score(Y_true, y_score_good[None, :]))
print("sklearn DCG (bad): ", sk_dcg_score(Y_true, y_score_bad[None, :]))
print("sklearn NDCG (good):", sk_ndcg_score(Y_true, y_score_good[None, :]))
print("sklearn NDCG (bad): ", sk_ndcg_score(Y_true, y_score_bad[None, :]))
for k in [1, 3, 5]:
print(f"DCG@{k} (good): {sk_dcg_score(Y_true, y_score_good[None, :], k=k):.4f}")
print(f"DCG@{k} (bad): {sk_dcg_score(Y_true, y_score_bad[None, :], k=k):.4f}")
sklearn DCG (good): 7.140995184095699
sklearn DCG (bad): 4.765286603584786
sklearn NDCG (good): 0.9999999999999998
sklearn NDCG (bad): 0.6673140760825534
DCG@1 (good): 3.0000
DCG@1 (bad): 0.0000
DCG@3 (good): 5.8928
DCG@3 (bad): 1.6309
DCG@5 (good): 7.1410
DCG@5 (bad): 3.6967
def dcg_rank_contributions(y_true_1d, y_score_1d, *, k=None, log_base=2, doc_ids=None):
"""Return rank-wise DCG components for a single query.
This ignores tie-averaging and is meant for visualization.
"""
y_true_1d = np.asarray(y_true_1d, dtype=float)
y_score_1d = np.asarray(y_score_1d, dtype=float)
if y_true_1d.shape != y_score_1d.shape:
raise ValueError("y_true_1d and y_score_1d must have the same shape")
n_labels = y_true_1d.size
order = np.argsort(y_score_1d)[::-1]
gains = y_true_1d[order]
scores = y_score_1d[order]
ranks = np.arange(1, n_labels + 1)
discount = 1.0 / (np.log(ranks + 1) / np.log(log_base))
if k is not None:
discount[k:] = 0.0
contrib = gains * discount
cum_dcg = np.cumsum(contrib)
if doc_ids is None:
doc_ids = np.arange(n_labels)
doc_ids = np.asarray(doc_ids)
return {
"rank": ranks,
"doc": doc_ids[order],
"gain": gains,
"score": scores,
"discount": discount,
"contrib": contrib,
"cum_dcg": cum_dcg,
}
def plot_dcg_breakdown(components, *, title="DCG breakdown"):
ranks = components["rank"]
fig = make_subplots(
rows=2,
cols=1,
shared_xaxes=True,
vertical_spacing=0.14,
specs=[[{"secondary_y": True}], [{"secondary_y": True}]],
)
fig.add_trace(
go.Bar(
x=ranks,
y=components["gain"],
name="gain (y_true)",
hovertemplate="rank=%{x}<br>gain=%{y}<br>doc=%{customdata[0]}<br>score=%{customdata[1]:.3f}<extra></extra>",
customdata=np.c_[components["doc"], components["score"]],
),
row=1,
col=1,
secondary_y=False,
)
fig.add_trace(
go.Scatter(
x=ranks,
y=components["discount"],
mode="lines+markers",
name="discount",
hovertemplate="rank=%{x}<br>discount=%{y:.3f}<extra></extra>",
),
row=1,
col=1,
secondary_y=True,
)
fig.add_trace(
go.Bar(
x=ranks,
y=components["contrib"],
name="gain × discount",
hovertemplate="rank=%{x}<br>contrib=%{y:.3f}<extra></extra>",
),
row=2,
col=1,
secondary_y=False,
)
fig.add_trace(
go.Scatter(
x=ranks,
y=components["cum_dcg"],
mode="lines+markers",
name="cumulative DCG",
hovertemplate="rank=%{x}<br>cum_dcg=%{y:.3f}<extra></extra>",
),
row=2,
col=1,
secondary_y=True,
)
fig.update_xaxes(title_text="rank (1 = top)", row=2, col=1)
fig.update_yaxes(title_text="gain", row=1, col=1, secondary_y=False)
fig.update_yaxes(title_text="discount", row=1, col=1, secondary_y=True)
fig.update_yaxes(title_text="contribution", row=2, col=1, secondary_y=False)
fig.update_yaxes(title_text="cumulative DCG", row=2, col=1, secondary_y=True)
fig.update_layout(title=title, legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="left", x=0))
return fig
good = dcg_rank_contributions(y_true_1q, y_score_good, k=None, doc_ids=doc_ids)
bad = dcg_rank_contributions(y_true_1q, y_score_bad, k=None, doc_ids=doc_ids)
plot_dcg_breakdown(good, title="Good ranking: per-rank DCG contributions").show()
plot_dcg_breakdown(bad, title="Bad ranking: per-rank DCG contributions").show()
def cumulative_dcg_curve(y_true_1d, y_score_1d, *, log_base=2):
comps = dcg_rank_contributions(y_true_1d, y_score_1d, k=None, log_base=log_base)
return comps["rank"], comps["cum_dcg"]
r, dcg_good = cumulative_dcg_curve(y_true_1q, y_score_good)
_, dcg_bad = cumulative_dcg_curve(y_true_1q, y_score_bad)
_, dcg_ideal = cumulative_dcg_curve(y_true_1q, y_true_1q) # ideal ordering
fig = go.Figure()
fig.add_trace(go.Scatter(x=r, y=dcg_good, mode="lines+markers", name="good"))
fig.add_trace(go.Scatter(x=r, y=dcg_bad, mode="lines+markers", name="bad"))
fig.add_trace(go.Scatter(x=r, y=dcg_ideal, mode="lines+markers", name="ideal (IDCG)"))
fig.update_layout(
title="Cumulative DCG vs rank",
xaxis_title="rank (1 = top)",
yaxis_title="cumulative DCG",
)
fig.show()
ks = np.arange(1, len(y_true_1q) + 1)
dcg_good_k = [sk_dcg_score(Y_true, y_score_good[None, :], k=int(k)) for k in ks]
dcg_bad_k = [sk_dcg_score(Y_true, y_score_bad[None, :], k=int(k)) for k in ks]
idcg_k = [sk_dcg_score(Y_true, Y_true, k=int(k), ignore_ties=True) for k in ks]
ndcg_good_k = [g / i if i > 0 else 0.0 for g, i in zip(dcg_good_k, idcg_k)]
ndcg_bad_k = [g / i if i > 0 else 0.0 for g, i in zip(dcg_bad_k, idcg_k)]
fig = make_subplots(rows=1, cols=2, subplot_titles=["DCG@k", "NDCG@k"], shared_xaxes=True)
fig.add_trace(go.Scatter(x=ks, y=dcg_good_k, mode="lines+markers", name="good"), row=1, col=1)
fig.add_trace(go.Scatter(x=ks, y=dcg_bad_k, mode="lines+markers", name="bad"), row=1, col=1)
fig.add_trace(go.Scatter(x=ks, y=ndcg_good_k, mode="lines+markers", name="good"), row=1, col=2)
fig.add_trace(go.Scatter(x=ks, y=ndcg_bad_k, mode="lines+markers", name="bad"), row=1, col=2)
fig.update_xaxes(title_text="k (cutoff)")
fig.update_yaxes(title_text="score")
fig.update_layout(title="Effect of cutoff k", legend=dict(orientation="h", yanchor="bottom", y=1.08, xanchor="left", x=0))
fig.show()
3) NumPy implementation (including tie averaging)#
We’ll implement a drop-in equivalent of scikit-learn’s dcg_score:
works on
y_trueandy_scoreshaped(n_samples, n_labels)supports
k,log_base,sample_weight, andignore_tiesmatches scikit-learn tie-averaging when
ignore_ties=False
def _as_2d(a, *, name):
a = np.asarray(a)
if a.ndim == 1:
a = a[None, :]
if a.ndim != 2:
raise ValueError(f"{name} must be 1D or 2D, got shape {a.shape}")
return a
def _discount_vector(n_labels, *, k=None, log_base=2):
if log_base <= 1:
raise ValueError("log_base must be > 1")
ranks0 = np.arange(n_labels)
discount = 1.0 / (np.log(ranks0 + 2.0) / np.log(log_base))
if k is not None:
if not (1 <= int(k) <= n_labels):
raise ValueError(f"k must be in [1, {n_labels}], got {k}")
discount[int(k) :] = 0.0
return discount
def _tie_averaged_dcg_1d(y_true_1d, y_score_1d, discount_cumsum):
"""DCG for one sample by averaging over all permutations of tied scores.
Matches sklearn.metrics._ranking._tie_averaged_dcg.
"""
_, inv, counts = np.unique(-y_score_1d, return_inverse=True, return_counts=True)
mean_gain_per_group = np.zeros(len(counts), dtype=float)
np.add.at(mean_gain_per_group, inv, y_true_1d)
mean_gain_per_group /= counts
groups_end = np.cumsum(counts) - 1
discount_sums = np.empty(len(counts), dtype=float)
discount_sums[0] = discount_cumsum[groups_end[0]]
discount_sums[1:] = np.diff(discount_cumsum[groups_end])
return float((mean_gain_per_group * discount_sums).sum())
def _dcg_sample_scores_numpy(y_true, y_score, *, k=None, log_base=2, ignore_ties=False):
y_true = np.asarray(y_true, dtype=float)
y_score = np.asarray(y_score, dtype=float)
if y_true.shape != y_score.shape:
raise ValueError(f"y_true and y_score must have the same shape, got {y_true.shape} vs {y_score.shape}")
n_samples, n_labels = y_true.shape
discount = _discount_vector(n_labels, k=k, log_base=log_base)
if ignore_ties:
ranking = np.argsort(y_score, axis=1)[:, ::-1]
ranked_true = y_true[np.arange(n_samples)[:, None], ranking]
return discount @ ranked_true.T
discount_cumsum = np.cumsum(discount)
out = np.empty(n_samples, dtype=float)
for i in range(n_samples):
out[i] = _tie_averaged_dcg_1d(y_true[i], y_score[i], discount_cumsum)
return out
def dcg_score_numpy(y_true, y_score, *, k=None, log_base=2, sample_weight=None, ignore_ties=False, return_per_sample=False):
"""NumPy implementation of sklearn.metrics.dcg_score."""
y_true = _as_2d(y_true, name="y_true")
y_score = _as_2d(y_score, name="y_score")
scores = _dcg_sample_scores_numpy(y_true, y_score, k=k, log_base=log_base, ignore_ties=ignore_ties)
if return_per_sample:
return scores
if sample_weight is None:
return float(np.mean(scores))
sample_weight = np.asarray(sample_weight, dtype=float)
if sample_weight.shape != (scores.shape[0],):
raise ValueError(f"sample_weight must have shape ({scores.shape[0]},), got {sample_weight.shape}")
return float(np.average(scores, weights=sample_weight))
def ndcg_score_numpy(y_true, y_score, *, k=None, sample_weight=None, ignore_ties=False, return_per_sample=False):
"""NumPy implementation of sklearn.metrics.ndcg_score (built from DCG)."""
y_true = _as_2d(y_true, name="y_true")
y_score = _as_2d(y_score, name="y_score")
gain = _dcg_sample_scores_numpy(y_true, y_score, k=k, log_base=2, ignore_ties=ignore_ties)
ideal = _dcg_sample_scores_numpy(y_true, y_true, k=k, log_base=2, ignore_ties=True)
out = np.zeros_like(gain, dtype=float)
mask = ideal != 0
out[mask] = gain[mask] / ideal[mask]
if return_per_sample:
return out
if sample_weight is None:
return float(np.mean(out))
sample_weight = np.asarray(sample_weight, dtype=float)
if sample_weight.shape != (out.shape[0],):
raise ValueError(f"sample_weight must have shape ({out.shape[0]},), got {sample_weight.shape}")
return float(np.average(out, weights=sample_weight))
# Sanity checks vs scikit-learn
Yt = rng.integers(0, 4, size=(50, 12))
Ys = rng.normal(size=(50, 12))
for ignore_ties in [True, False]:
for k in [None, 5]:
for base in [2, 10]:
sk = sk_dcg_score(Yt, Ys, k=k, log_base=base, ignore_ties=ignore_ties)
np_ = dcg_score_numpy(Yt, Ys, k=k, log_base=base, ignore_ties=ignore_ties)
print(f"ignore_ties={ignore_ties} k={k} base={base} | abs diff={abs(sk - np_):.12f}")
# A tie-heavy example (by rounding scores)
Ys_ties = np.round(Ys, 1)
sk = sk_dcg_score(Yt, Ys_ties, k=5, ignore_ties=False)
np_ = dcg_score_numpy(Yt, Ys_ties, k=5, ignore_ties=False)
print("\nTie-heavy check | sklearn:", sk, "numpy:", np_, "abs diff:", abs(sk - np_))
# The classic tied-top example from sklearn docs
true_relevance = np.asarray([[10, 0, 0, 1, 5]])
scores = np.asarray([[1, 0, 0, 0, 1]])
print("\nTies at the top (k=1)")
print("sklearn (tie-avg): ", sk_dcg_score(true_relevance, scores, k=1))
print("numpy (tie-avg): ", dcg_score_numpy(true_relevance, scores, k=1))
print("sklearn (ignore): ", sk_dcg_score(true_relevance, scores, k=1, ignore_ties=True))
print("numpy (ignore): ", dcg_score_numpy(true_relevance, scores, k=1, ignore_ties=True))
ignore_ties=True k=None base=2 | abs diff=0.000000000000
ignore_ties=True k=None base=10 | abs diff=0.000000000000
ignore_ties=True k=5 base=2 | abs diff=0.000000000000
ignore_ties=True k=5 base=10 | abs diff=0.000000000000
ignore_ties=False k=None base=2 | abs diff=0.000000000000
ignore_ties=False k=None base=10 | abs diff=0.000000000000
ignore_ties=False k=5 base=2 | abs diff=0.000000000000
ignore_ties=False k=5 base=10 | abs diff=0.000000000000
Tie-heavy check | sklearn: 4.726341544169852 numpy: 4.726341544169852 abs diff: 0.0
Ties at the top (k=1)
sklearn (tie-avg): 7.5
numpy (tie-avg): 7.5
sklearn (ignore): 5.0
numpy (ignore): 5.0
4) Using DCG/NDCG to optimize/tune a simple model#
DCG depends on sorting, so it’s not smoothly differentiable in the usual sense. In practice, you often:
optimize a differentiable surrogate loss (pointwise/pairwise/listwise), and evaluate with NDCG/DCG
or use a black-box optimizer to directly maximize NDCG/DCG
Here we’ll do (2) with a simple linear scoring model:
and optimize \(w\) by coordinate ascent to maximize mean NDCG@k on a toy ranking dataset.
def make_synthetic_ranking_data(*, n_queries=400, n_docs=12, n_features=6, noise=0.7, seed=0):
rng = np.random.default_rng(seed)
w_true = rng.normal(size=n_features)
X = rng.normal(size=(n_queries, n_docs, n_features))
raw = X @ w_true + noise * rng.normal(size=(n_queries, n_docs))
# Convert raw scores to graded relevance per query using quantiles.
q50 = np.quantile(raw, 0.50, axis=1, keepdims=True)
q75 = np.quantile(raw, 0.75, axis=1, keepdims=True)
q90 = np.quantile(raw, 0.90, axis=1, keepdims=True)
y = np.zeros_like(raw, dtype=int)
y += raw >= q50
y += raw >= q75
y += raw >= q90
return X, y.astype(float), w_true
X, y_true, w_true = make_synthetic_ranking_data()
X_train, X_val, y_train, y_val = train_test_split(X, y_true, test_size=0.30, random_state=0)
print("X_train:", X_train.shape, "y_train:", y_train.shape)
print("relevance levels in y:", np.unique(y_true))
X_train: (280, 12, 6) y_train: (280, 12)
relevance levels in y: [0. 1. 2. 3.]
def mean_ndcg_at_k(y_true, y_score, *, k=5):
return ndcg_score_numpy(y_true, y_score, k=k, ignore_ties=False)
def coordinate_ascent_ndcg(
X_train,
y_train,
X_val,
y_val,
*,
k=5,
n_passes=30,
step=0.5,
step_decay=0.9,
init_scale=0.2,
seed=0,
):
rng = np.random.default_rng(seed)
n_features = X_train.shape[-1]
w = rng.normal(scale=init_scale, size=n_features)
w0 = w.copy()
def score(X, w):
return X @ w
def evaluate(X, y, w):
return mean_ndcg_at_k(y, score(X, w), k=k)
history = {
"pass": [],
"step": [],
"train_ndcg": [],
"val_ndcg": [],
}
best_val = -np.inf
best_w = w.copy()
for p in range(n_passes):
improved = False
base_train = evaluate(X_train, y_train, w)
for j in range(n_features):
best_local = base_train
best_delta = 0.0
for delta in (+step, -step):
w_try = w.copy()
w_try[j] += delta
val = evaluate(X_train, y_train, w_try)
if val > best_local + 1e-12:
best_local = val
best_delta = delta
if best_delta != 0.0:
w[j] += best_delta
base_train = best_local
improved = True
train_ndcg = evaluate(X_train, y_train, w)
val_ndcg = evaluate(X_val, y_val, w)
history["pass"].append(p)
history["step"].append(step)
history["train_ndcg"].append(train_ndcg)
history["val_ndcg"].append(val_ndcg)
if val_ndcg > best_val:
best_val = val_ndcg
best_w = w.copy()
step *= step_decay
if not improved:
# No single-coordinate improvement found at this step size.
break
return w0, best_w, history
w0, w_best, hist = coordinate_ascent_ndcg(X_train, y_train, X_val, y_val, k=5, n_passes=40, step=0.6, step_decay=0.9, seed=1)
print("initial val NDCG@5:", mean_ndcg_at_k(y_val, X_val @ w0, k=5))
print("best val NDCG@5:", mean_ndcg_at_k(y_val, X_val @ w_best, k=5))
print("true val NDCG@5:", mean_ndcg_at_k(y_val, X_val @ w_true, k=5))
initial val NDCG@5: 0.33609233877721406
best val NDCG@5: 0.8092268103302992
true val NDCG@5: 0.8086706878351096
fig = go.Figure()
fig.add_trace(go.Scatter(x=hist["pass"], y=hist["train_ndcg"], mode="lines+markers", name="train NDCG@5"))
fig.add_trace(go.Scatter(x=hist["pass"], y=hist["val_ndcg"], mode="lines+markers", name="val NDCG@5"))
fig.update_layout(
title="Coordinate ascent on NDCG@5 (linear scoring model)",
xaxis_title="pass",
yaxis_title="NDCG@5",
)
fig.show()
# Visualize one query: relevance by predicted rank (before vs after)
q = 0
yq = y_val[q]
s0 = (X_val[q] @ w0)
s1 = (X_val[q] @ w_best)
comp0 = dcg_rank_contributions(yq, s0, k=5)
comp1 = dcg_rank_contributions(yq, s1, k=5)
fig = make_subplots(rows=1, cols=2, subplot_titles=["initial ranking", "optimized ranking"], shared_yaxes=True)
fig.add_trace(go.Bar(x=comp0["rank"], y=comp0["gain"], name="gain"), row=1, col=1)
fig.add_trace(go.Bar(x=comp1["rank"], y=comp1["gain"], name="gain"), row=1, col=2)
fig.update_xaxes(title_text="rank (1 = top)")
fig.update_yaxes(title_text="gain (true relevance)")
fig.update_layout(title=f"One validation query: gains by predicted rank (k=5)")
fig.show()
print("Query DCG@5 (initial): ", dcg_score_numpy(yq, s0, k=5))
print("Query DCG@5 (optimized):", dcg_score_numpy(yq, s1, k=5))
print("Query NDCG@5 (initial): ", ndcg_score_numpy(yq, s0, k=5))
print("Query NDCG@5 (optimized):", ndcg_score_numpy(yq, s1, k=5))
Query DCG@5 (initial): 1.0
Query DCG@5 (optimized): 4.517782560805999
Query NDCG@5 (initial): 0.14902422012004707
Query NDCG@5 (optimized): 0.6732590227960631
5) Pros, cons, pitfalls#
Pros#
Ranking-first: directly measures whether relevant items are near the top.
Top-heavy: logarithmic discount matches many UX settings (users rarely scroll far).
Graded relevance: naturally supports 0/1/2/3 relevance, not just binary.
Cutoff-friendly: DCG@k focuses evaluation on the top-\(k\) you actually show.
Cons#
Not normalized: raw DCG depends on the gain scale and number of items; comparing across queries is hard (use NDCG).
Non-smooth objective: depends on sorting; direct gradient-based optimization is non-trivial.
Ignores calibration: only the ordering matters (monotonic transforms of scores keep DCG unchanged unless they create ties).
Common pitfalls#
Mixing queries: DCG is defined per query (a list). Averaging over unrelated items can be meaningless.
Ties: if your model outputs few distinct scores, tie-handling matters;
ignore_ties=Truecan be wrong.Gain definition: if you expect exponential gain (e.g., \(2^y-1\)), you must transform
y_trueyourself.Negative gains: NDCG may fall outside \([0,1]\) if
y_truecontains negative values.
Where DCG/NDCG is especially useful#
search result ranking
recommendation lists
multilabel problems where you care about the ordering of labels
Exercises#
Modify
dcg_score_numpyto accept again_fnand compare linear vs exponential gains.Try different values of
kand observe how the coordinate ascent solution changes.Create a dataset where optimizing a pointwise loss (e.g., MSE on gains) does not improve NDCG@k much. Why?
References#
J\u00e4rvelin, K., & Kek\u00e4l\u00e4inen, J. (2002). Cumulated gain-based evaluation of IR techniques.
scikit-learn:
sklearn.metrics.dcg_score,sklearn.metrics.ndcg_score